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ESSENCE  Biosurveillance  Systems kjp 


•  ESSENCE:  An  Electronic 
Surveillance  System  for 
the  Early  Notification  of 
Community-based  Epidemics 

•  Monitoring  health  care  data 

-  -800  military  treatment  facilities 
since  Sept.  2001 

-  12  major  metropolitan  civilian  areas 


•  Evaluating  data  sources 

-  Civilian  physician  visits 

-  OTC  pharmacy  sales 

-  Prescription  sales 

-  Nurse  hotline/EMS  data 

-  Absentee  rate  data 

•  Developing  &  implementing  alerting 
algorithms 


Envisioned:  Decisions  Based  on 
Disparate  Evidence 
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Environmental  Data: 
Climate 

Air/Water  Quality 
Allergen  Levels 


Syndromic  Time  Series 


•  Aberration  detection  algorithms 

-  Data  modeling:  multivariate  regression 

•  Covariates:  Holiday,  post-holiday,  trend,  provider 
count,... 

-  Statistical  process  control 

EWMA,  CUSUM  charts 

•  Combining  data  sources 

-  Multiple  univariate:  combine  p-values 

-  Multivariate:  Hotelling’s  T2 variants:  MEWMA,  ... 
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•  Evidence  disparate  in  scale,  variability,  specificity, 
timeliness 

-  syndromic :  ED  data  specific,  possibly  late;  OTC  data 
nonspecific,  potentially  timely 

-  sensor,  sparse  spatial  coverage;  data  gaps 

•  Informatics  issues 

-  Differential  lags  in  signal  effect,  reporting 

-  Data  dropouts 

•  Differential  background  characterization 

•  Differential  signal  characterization 

•  Differential  information  value  (relevance) 
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•  Graphical  representation  of  conditional  dependencies 

•  Inclusion  of  disparate  evidence  types 

-  Continuous/discrete  data  or  derived  probabilities 

-  Expert/heuristic  knowledge 

•  Can  weight  statistical  hypothesis  test  evidence  using 
heuristics  -  not  restricted  to  fixed  p-value  thresholds 

•  Can  exploit  advances  in  data  modeling,  multivariate 
anomaly  detection 

•  Modularity  in  data  fusion  approach 

•  Management  of  missing  data 

•  Can  model 

-  Personal  weighting  of  evidence 

-  Lags  in  data  availability  or  reporting 
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Inhalational  anthrax  ...  a  biphasic  clinical  illness  ... 

1-to  4-day  initial  phase  of  malaise,  fatigue,  fever,  myalgias,  and 
nonproductive  cough,  followed  by  a  fulminant  [sudden  and  severe] 
phase  of  respiratory  distress,  cyanosis,  and  diaphoresis  [sweating]. 
Death  follows  the  onset  of  the  fulminant  phase  in  1  to  2  days. 

John  A.  Jernigan,  et  al.,  “Bioterrorism-Related  Inhalational  Anthrax:  The  First  10  Cases 
Reported  in  the  United  States,”  Emerging  Infectious  Diseases,  Vol.  7,  No.  6,  November- 
December  2001 


Data  from  the  Sverdlovsk  outbreak  indicate  a  modal  incubation  time  of 
approximately  10  days  for  inhalational  anthrax.  However,  the  onset  of 
symptoms  occurred  up  to  six  weeks  after  the  reported  date  of  exposure. 

Such  long  incubation  times  presumably  reflect  the  ability  of  viable  anthrax 
spores  to  remain  in  the  lungs  for  many  days.  Longer  incubation  periods  may 
be  associated  with  smaller  inocula. 

Terry  C.  Dixon,  B.S.,  et  al.,  “Anthrax,”  NEJM  ,  Volume  341:815-826,  Number  11,  September  9,  1999 
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Prior  Probabilities 


P (Flu  Outbreak  Occurring)  =  0.05 


P(Anthrax  Outbreak  Occurring)  =  0.001 
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Pollen  Interference 


Fusion  of  anomalies 


Temporal 

dependencies 


in  syndromic  data 


Sensor/Environment  Interactions 
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•  Availability  of  practical,  verifiable  data: 

-  For  “truth  data”:  daily  clinical  diagnosis  counts 

-  For  “evidence”:  daily  environmental,  syndromic  data 

•  Known  asthma  triggers  with  complex  interaction 

-  Air  quality  (EPA  data) 

•  Concentration  of  particulate  matter,  allergens 

•  Ozone  levels 

-  Temperature  (NOAA  data) 

-  Viral  infections  (Syndromic  data) 

•  Evidence  from  combination  of  expert  knowledge, 
historical  data 
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*  Ozone: 

-  Burnett  et  al,  1994; 

-  Sartor  et  al,  1995; 

-  Stern  et  al,  1994; 

-  Stieb  et  al,  1996; 

-  Zhang  et  al,  2004  and  others. 

*  Particulate  Matter  (PM): 

-  Anderson  et  al,  2001; 

-  Chuersuwan  et  al  2000; 

-  Leaderer  et  al,  2003; 

-  Howel  et  al,  2001; 

-  Norris  et  al,  1999; 

-  Ward  and  Ayres,  2004  and  others. 

*  Allergens: 

-  Solomon  2002; 

-  Taylor  et  al  2002; 

-  Ziska  et  al,  2003  and  others, 


*  Viral  Infections: 

-  Hegele,  1999; 

-  Cohen  and  Castro,  2003; 

-  Lemanske,  2003  and  others; 

•  Cold  Weather: 

-  Anderson  et  al,  2001; 

-  Jamason  et  al,  1997; 

-  Packe  and  Ayres,  1985; 

-  Sartor  et  al,  1995; 

-  Schachter  et  al,  1 981 ,  others. 
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1/1/02 


Environmental  Evidence: 
Allergen  Levels  and  Diagnosis  Counts 
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Asthma  Diagnosis  Counts  and  Pollen/Mold  Level  Over  Time 
in  the  Baltimore-Washington  Area 
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Daily  Asthma  Diagnoses 


Syndromic  Evidence: 

OTC  Sales  and  Diagnosis  Counts 
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DC  -  Asthma  visits  (ICD-9  493)  and  Antihistamine  Use 
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1.  Total  NCR  Asthma  and  Provider  Count 


1.  All  NCR  county  military  and 
civilian  asthma  and  provider 
counts  are  totaled. 

2.  Regression  algorithm  seeks 
‘anomalies’  taking  into 
account: 

*  Day  of  week 

*  Holidays 

*  Data  trends 


2.  Regression 


3. 


4. 


Regression  output  is  rescaled 
using  a  sigmoidal  function 
designed  to  “stretch”  out  the 
high  end  of  the  regression 
output. 

Output  >  0.9  are  chosen  as  flare 
up  ‘seeds’  and  extended  three 
days  before  and  1  day  after  to 
generate  “truth.” 


3.  Probability  Map 


1/1/02  4/1/02  7/1/02  10/1/02  1/1/03  4/1/03  7/1/03  10/1/03  12/31/03 


4.  Unbiased  Asthma  Flare  Ups 


•  Structure  Learning 

-  Determining  nodes,  edges  of  graph:  what  are  the  effective 
relationships  (cond.  dependencies)  among  data  types,  other 
nodes?  (not  automated:  only  heuristic  structure  used) 

•  Parameter  Learning 

-  Maximum  Likelihood  Estimation  (MLE):  compute  CPTs  that  best 
explain  data  in  a  “brute  force”  frequency  density  sense 

•  Then  ProbMLE(data)  =  Prob(data  |  MLE  CPTs) 

-  Maximum  A  Posteriori  (MAP):  compute  CPTs  that  best  explain 
data  given  prior  CPT  estimates,  along  with  weights 

•  Then  ProbMAP(data)  =  Prob(data  |  MAP  CPTs) 
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Sensitivity 


Theoretical  limit 


ROC  curve  for  2002 

All  NCR,  military  and 
civilian 

Asthma  “outbreaks” 

-  10  (auto)  identified 

-  5  day  windows 

All-heuristic  BBN 
performs  very  well 


Mean  time  between  false  alarms  (days) 

All  bio-terror  networks  require  heuristic  parameters 
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Sensitivity 


ROC  curve  for  2002 

All  NCR,  military  and 
civilian 

Asthma  “outbreaks” 

-  10  (auto)  identified 

-  5  day  windows 

Fusion  of  sensor  data 
critical  to  sensitivity 
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•  Inferencing/learning  with  BBNs  is  NP-hard 

•  Heuristics  severely  constrain  problem 

-  Data  is  aggregated  to  increase  SNR 

-  Only  select  data  is  used  as  evidence 

-  Modularity  of  structure  allows  approximations  that 
reduce  computations 

•  Mean-field  approximations 
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•  As  a  classifier,  untrained  heuristic-only  BBN  significantly 
outperformed 

-  BBN  against  same  flare-ups  with  randomized  days  of  occurrence 

-  BBN  trained  with  data  by  MLE  from  random  initial  CPTs 

•  MLE  training  improved  heuristic-only  BBN  performance 
across  range  of  practical  false  alarm  rates 

•  Sensitivity  analysis  using  ROC  curve  analysis  can  reveal 
contributions  of  individual  data  sources;  fusion  with 
sensor  data  outperformed  syndromic  alone 

•  BBN  modeling  “works”,  but  for  effective  real-world 
performance,  development  of  tools  for  improving  graph 
structure,  parameter  learning,  and  prior  probabilities  is 
needed  along  with  underlying  data  analysis 
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•  Application-related 

-  Obtain  &  analyze  biosensor  data  for  background 
characterization 

-  Develop  cond.  prob.  tables  for  inclusion  in  BBN 

•  BBN  Learning-related 

-  Evaluate  &  compare  parameter  learning  approaches 

-  Test  model  variations 

•  Validation-related  (with  improved  datasets) 

-  Temporal  cross-validation:  e.g.  application  of 
2003-based  CPTs  to  2004 

-  Spatial  cross-validation:  e.g.  application  of  NCR-data- 
based  CPTS  to  San  Diego,  other  areas 
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BACKUPS 


Posterior 

Probability 


Conditional  *  Prior 
Likelihood  Probability 


Marginal  Likelihood 


Example: 

Posterior  probability  =  Prob  (  anthrax  attack  |  biosensor  alert) 
Conditional  likelihood  =  Prob  (  biosensor  alert  |  anthrax  attack  ) 
Prior  probability  =  Prob  (  anthrax  attack  ) 

Marginal  likelihood  =  Prob  (  biosensor  alert ) 
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Evidence 
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